SRI Submissions to Chinese-English PatentMT NTCIR10
نویسندگان
چکیده
The SRI team joined the subtask of Chinese-English Patent machine translation evaluation, and submitted the transla tion results using a combined output from two types of gram mars supported in SRlnterp, with two different word seg mentations. We investigated the effect of adding sparse fea tures, together with several optimization strategies. Also,for the PatentMT domain, we carried out preliminary experi ments on adapting language models. Our results showed positive improvements using these approaches.
منابع مشابه
SRI's Submissions to Chinese-English PatentMT NTCIR10 Evaluation
The SRI team joined the subtask of Chinese-English Patent machine translation evaluation, and submitted the translation results using a combined output from two types of grammars supported in SRInterp [13], with two different word segmentations. We investigated the effect of adding sparse features, together with several optimization strategies. Also,for the PatentMT domain, we carried out preli...
متن کاملThe HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10
We describe the statistical machine translation (SMT) systems developed at Heidelberg University for the Chinese-toEnglish and Japanese-to-English PatentMT subtasks at the NTCIR10 workshop. The core system used in both subtasks is a combination of hierarchical phrase-based translation and discriminative training using either large feature sets and `1/`2 regularization (for Japanese-to-English) ...
متن کاملNTT-NII Statistical Machine Translation for NTCIR-10 PatentMT
This paper describes details of the NTT-NII system in NTCIR10 PatentMT task. The system is an extension of the NTTUT system in NTCIR-9 by: a new English dependency parser (for EJ task), a syntactic rule-based pre-ordering (for JE task), a syntax-based post-ordering (for JE task). Our system ranked 1st in EJ subtask both in automatic and subjective evaluation, and was the best SMT system in JE s...
متن کاملUsing Parallel Corpora to Automatically Generate Training Data for Chinese Segmenters in NTCIR PatentMT Tasks
Chinese texts do not contain spaces as word separators like English and many alphabetic languages. To use Moses to train translation models, we must segment Chinese texts into sequences of Chinese words. Increasingly more software tools for Chinese segmentation are populated on the Internet in recent years. However, some of these tools were trained with general texts, so might not handle domain...
متن کاملHPB SMT of FRDC Assisted by Paraphrasing for the NTCIR-9 PatentMT
ABSTRACT This paper describes the FRDC machine translation system for the NTCIR-9 PatentMT. The FRDC system JIANZHEN is a hierarchical phrase-based (HPB) translation system. We participated in all the three subtasks, i.e., Chinese to English, Japanese to English and English to Japanese. In this paper, we introduce a novel paraphrasing mechanism to handle a certain kind of Chinese sentences whos...
متن کامل